In [ ]:
import numpy as np
A NumPy array is an object of numpy.ndarray
type:
In [ ]:
a = np.arange(3)
type(a)
All ndarray
s have a .base
attribute.
If this attribute is not None
, then the array is a view of some other object's memory, typically another ndarray
.
This is a very powerful tool, because allocating memory and copying memory contents are expensive operations, but updating metadata on how to interpret some already allocated memory is cheap!
The simplest way of creating an array's view is by slicing it:
In [ ]:
a = np.arange(3)
a.base is None
In [ ]:
a[:].base is None
Let's look more closely at what an array's metadata looks like. NumPy provides the np.info
function, which can list for us some low level attributes of an array:
In [ ]:
np.info(a)
By the end of the workshop you will understand what most of these mean. But rather than listen through a lesson, you get to try and figure what they mean yourself. To help you with that, here's a function that prints the information from two arrays side by side:
In [ ]:
def info_for_two(one_array, another_array):
"""Prints side-by-side results of running np.info on its inputs."""
def info_as_ordered_dict(array):
"""Converts return of np.infor into an ordered dict."""
import collections
import io
buffer = io.StringIO()
np.info(array, output=buffer)
data = (
item.split(':') for item in buffer.getvalue().strip().split('\n'))
return collections.OrderedDict(
((key, value.strip()) for key, value in data))
one_dict = info_as_ordered_dict(one_array)
another_dict = info_as_ordered_dict(another_array)
name_w = max(len(name) for name in one_dict.keys())
one_w = max(len(name) for name in one_dict.values())
another_w = max(len(name) for name in another_dict.values())
output = (
f'{name:<{name_w}} : {one:>{one_w}} : {another:>{another_w}}'
for name, one, another in zip(
one_dict.keys(), one_dict.values(), another_dict.values()))
print('\n'.join(output))
np.arange
).np.info
on your array and on slices of it (use the [start:stop:step]
indexing syntax, and make sure to try steps other than one).
In [ ]:
# Your code goes here
Every array has an underlying block of memory assigned to it. When we slice an array, rather than making a copy of it, NumPy makes a view, reusing the memory block, but interpreting it differently.
Lets take a look at what NumPy did for us in the above examples, and make sense of some of the changes to info.
.shape
attribute..strides
attribute of any array..data
attribute. .itemsize
attribute of its .dtype
attribute, i.e. array.dtype.itemsize
..dtype
attribute..contiguous
attribute of the arrays .flags
attributeTake a couple or minutes to familiarize yourself with the NumPy array's attributes discussed above:
.shape
, .strides
, .dtype
, .flags
and .data
attributes..dtype
and .flags
, store them into a separate variable, and use tab completion on those to explore their subattributes.
In [ ]:
# Your code goes here
Similarly to how we can change the shape, strides and data pointer of an array through slicing, we can change how it's items are interpreted by changing it's data type.
This is done by calling the array's .view()
method, and passing it the new data type.
But before we go there, lets look a little closer at dtypes. You are hopefully familiar with the basic NumPy numerical data types:
Type Family | NumPy Defined Types | Character Codes |
---|---|---|
boolean | np.bool |
'?' |
unsigned integers | np.uint8 - np.uint64 |
'u1' , 'u2' , 'u4' , 'u8' |
signed integers | np.int8 - np.int64 |
'i1' , 'i2' , 'i4' , 'i8' |
floating point | np.float16 - np.float128 |
'f2' , 'f4' , 'f8' , 'f16' |
complex | np.complex64 , np.complex128 |
'c8' , 'c16' |
You can create a new data type by calling its constructor, np.dtype()
, with either a NumPy defined type, or the character code.
Character codes can have '<'
or '>'
prepended, to indicate whether the type is little or big endian. If unspecified, native encoding is used, which for all practical purposes is going to be little endian.
Let's play a little with dtype views:
np.arange(4, dtype=np.uint16)
.np.uint8
of your array. This will give you the raw byte contents of your array. Is this what you were expecting?np.uint8
values which, when viewed as a np.float32
give the values 1, -2, and 1/3.
In [ ]:
# Your code goes here
You typically construct your NumPy arrays using one of the many factory fuctions provided, np.array()
being the most popular.
But it is also possible to call the np.ndarray
object constructor directly.
You will typically not want to do this, because there are probably simpler alternatives.
But it is a great way of putting your understanding of views of arrays to the test!
You can check the full documentation, but the np.ndarray
constructor takes the following arguments that we care about:
.data
attribute,
In [ ]:
# Your code goes here
So far we have sticked to one dimensional arrays. Things get substantially more interesting when we move into higher dimensions.
One way of getting views with a different number of dimensions is by using the .reshape()
method of NumPy arrays, or the equivalent np.reshape()
function.
The first argument to any of the reshape functions is the new shape of the array. When providing it, keep in mind:
-1
for one of the new dimensions, you can have NumPy compute its value for you, but the other dimensions must be compatible with the calculated one being an integer..reshape()
can also take an order=
kwarg, which can be set to 'C'
(as the programming language) or 'F'
(for the Fortran programming language). This correspond to row and column major orders, respectively.
Let's look at how multidimensional arrays are represented in NumPy with an exercise.
6 = 2 * 3
.order='C'
. Try both possible combinations of rows and columns, e.g. (2, 3)
and (3, 2)
. Look at the resulting arrays, and compare their metadata. Do you understand what's going on?order='F'
. Can you see what the differences are?
In [ ]:
# Your code goes here
As the examples show, an n-dimensional array will have an n item tuple .shape
and .strides
. The number of dimensions can be directly queried from the .ndim
attribute.
The shape tells us how large the array is along each dimension, the strides tell us how many bytes to skip in memory to get to the next item along each dimension.
When we reshape an array using C order, a.k.a. row major order, items along higher dimensions are closer in memory. When we use Fortran orser, a.k.a. column major order, it is items along smaller dimensions that are closer.
In [ ]:
a = np.arange(12, dtype=float)
a
In [ ]:
a.reshape(4, 3).sum(axis=-1)
You can apply fancier functions than .sum()
, e.g. let's compute the variance of each group:
In [ ]:
a.reshape(4, 3).var(axis=-1)
Your turn to do a fancier reshaping: we will compute the average of a 2D array over non-overlapping rectangular patches:
m
and n
, e.g. 3
and 4
.15 x 24
.m x n
tiles, e.g. a 5 x 6
array..sum()
can take a tuple of integers as axis=
, so you can do the whole thing in a single reshape from 2D to 4D, then aggregate back to 2D. If tyou find this confusing, doing two aggregations will also work.
In [ ]:
# Your code goes here
Once we have a multidimensional array, rearranging the order of its dimensions is as simple as rearranging its .shape
and .tuple
attributes. You could do this with np.ndarray
, but it would be a pain. NumPy has a bunch of functions for doing that, but they are all watered down versions of np.transpose
, which takes a tuple with the desired permutation of the array dimensions.
roll_axis_to_end
that takes an array and an axis, and makes that axis the last dimension of the array.np.ndarray
.
In [ ]:
# Your code goes here
In [ ]:
# Your code goes here
stacked_column_vector
and stacked_row_vector
, that take a 1D array (the vector), and an integer n
, and create a 2D view of the array that stack n
copies of the vector, either as columns or rows of the view.outer_product
function that takes two 1D vectors and computes their outer product.
In [ ]:
# Your code goes here
In the last exercise we used zero strides to reuse an item more than once in the resulting view. Let's try to build on that idea:
window
integer value, and creates a 2D view of the array, each row a view through a sliding window of size window
into the original array.len(array) - window + 1
such "views through a window".Another hint: Here's a small example expected run:
>>> sliding_window(np.arange(4), 2)
[[0, 1],
[1, 2],
[2, 3]]
In [ ]:
# Your code goes here
In [ ]:
from numpy.lib.stride_tricks import as_strided
np.info(as_strided)
Note that this function will not protect you, the way np.ndarray
does, from accessing memory that is not indexed by the array the view is taken for. You may want to do that, but be wary of the world of segmentation faults you are getting yourself into!